Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
upwork.com 🟢 2026-05-10
🔹 Extract contact information from Enterprise Agreements and OCR signatories
👤 Client: 🇦🇺 Australia Member since 2016-09-30
💰 Price: ****
🚩 Problem: Scrape and enrich a directory of Enterprise Agreements to capture key fields and identify signatories within PDFs.
📦 Existing: Not specified
Specifications:
[Target] Scrape the FWC website for agreement details including top-level fields from results page.
[Method] Use web scraping tools like BeautifulSoup or Puppeteer, followed by OCR to extract signatory information.
[UI/UX] Not applicable
[Stack] Python (BeautifulSoup/Puppeteer), Tesseract (OCR), PyMuPDF (PDF parsing), Elasticsearch (storage)
[Security] Ensure data is handled securely and comply with relevant privacy laws. Use encryption for sensitive data in transit and at rest.
[Format] JSON or CSV for structured storage of scraped data.
Workflow:
1. Set up web scraping to access the FWC website and extract top-level fields from the search results page.
2. Implement OCR using Tesseract to identify signatories in PDFs, including their titles if stated.
3. Develop a secondary verification step that checks for fuzzy matches of names against company employees (e.g., 'J Doe' -> 'L Doe').
4. Extract employer representative information by filtering out irrelevant names and ensuring only the correct contact is captured.
5. Optionally, use Apollo or similar service to search for relevant contacts based on provided title keywords at each company.